note that the .Rmd can be found on my personal repo for the corse

Constraining the effects of topography and climate on climate change sensitivity of glaciers in Tibet

The Tibetan Plateau region is home to the greatest amount of non-polar glaciers, which are also highly sensitive to climate change and has recently been experiencing the greatest rates of mass losses. These glaciers are primary freshwater sources to major rivers across Asia, with increased rates of melting increasing the risks of floods and other geo-hazards to some of the world’s most densely populated regions. Yet the geophysical and atmospheric conditions driving changes in Tibetan plateau glacier mass balance remain relatively understudied, with uncertainties in which variables can best predict glacier mass change. In this analysis I will use four different modeling approaches to assess which topographical and climatological drives are most important in explaining changes in glacier equilibrium line altitude (ELA) and whether these drives can be used to explain glacier ELA changes since the little ice age. The first approach uses a Mixed effects model with random effects, where all variables where chosen based on a priori knowledge gained from the readings and an exploratory correlation plot. Following this I constructed an identically structured hierarchical Bayesian model (using the MCMCglm package), to compare the identified importance of variables under a frequentest vs baysian framework. As the exploratory correlation plot showed that many variables displayed high levels of collinearity/multicollinearity, I decided to create a model using principal components from a PCA, to preserve all available variables, while not violating any assumptions of variable collinearity and co-dependence. Finally, I decided to use an automatic variable selection method (MUMIn), where I included the maximum number of variables possible, within the computational limits of my PC. While I believe that the Baysian modeling approaches are best suited to identifying which variables influence ELA and dELA, due to their combination of using pre defined variables chosen based on mechanistic processes and their relatively low BIC values when cross compared. In conclusion I synthesize the results all four models to judge which varibales are most important for predicitng ELA and if ELA shifts since the little ice age can be predicted using topographical and climatological variables.

In this analysis I attempted to justify all modeling structure and variable selection decisions made. As I quite unfamiliar with climatic modeling and glacier dynamics, some of these decisions may be sub-optimal. Yet, these decisions were made to the best of my abilities, without going over the top on background research and overall, my models should have a coherent structure and sufficient work-flow documentation. In none of the preliminary readings, did I encountered the temperature or precipitation in a specific month singled out to be of particular importance for glacier ELA. Therefore, I decided to exclude all monthly climatic variables and only focus on the aggregated factors. Along with the provided variables, I also included a new variable called temperature variability, as my preliminary readings highlighted its potential importance (Brun, et. al., 2017) and I was able to compute it with the provided dataset. While I include p-values throughout my analysis, in this report I interpret variable significance as having a relatively large effect, with a standard error small enough that when ± to the effect size, do not make it cross 0 (the implication being that the effect size is large enough to have a measurable influence, even when considering the uncertainty around the estimate).

1.Data import and wrangeling

glacier <- read_csv("~/Documents/humbolt/semester_01/quantitative_methods/assignments/glacier_sensitivity.csv") %>%
  na.exclude 
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_double(),
##   orientation = col_character()
## )
## ℹ Use `spec()` for the full column specifications.
glacier_scaled <- glacier %>% 
  # adding a new variable for temperature variability; a factor mentioned to be important in multiple papers
  mutate(T_variability = ((T_MAX_mean_monsoon + T_MAX_mean_not_monsoon)-(T_MIN_mean_monsoon + T_MIN_mean_not_monsoon) /2),
          # scaling all varibales used in modelling
         length = scale(length, center = T, scale = T),
         area = scale(area, center = T, scale = T),
         P_snow = scale(P_snow, center = T, scale = T),
         P_year = scale(P_year, center = T, scale = T),
         P_monsoon = scale(P_monsoon, center = T, scale = T),
         P_not_monsoon = scale(P_not_monsoon, center = T, scale = T),
         T_MIN_mean_monsoon = scale(T_MIN_mean_monsoon, center = T, scale = T),
         T_MIN_mean_not_monsoon = scale(T_MIN_mean_not_monsoon, center = T, scale = T),
         T_MAX_mean_monsoon = scale(T_MAX_mean_monsoon, center = T, scale = T),
         T_MAX_mean_not_monsoon = scale(T_MAX_mean_not_monsoon, center = T, scale = T),
         T_variability = scale(T_variability, center = T, scale = T),
         T_mean_mea.yr = scale(T_mean_mea.yr, center = T, scale = T),
         T_mean_monsoon = scale(T_mean_monsoon, center = T, scale = T),
         T_mean_not_monsoon = scale(T_mean_not_monsoon, center = T, scale = T),
         Slope_min = scale(Slope_min, center = T, scale = T),
         Slope_max = scale(Slope_max, center = T, scale = T),
         Slope_mean = scale(Slope_mean, center = T, scale = T),
         Elev_min = scale(Elev_min, center = T, scale = T),
         Elev_max = scale(Elev_max, center = T, scale = T),
         Elev_mean = scale(Elev_mean, center = T, scale = T))





# Here is the for a exploratory correlation plot I used to visually assess my variables to determine what to potentially include in my models, and to identify where strong variable co-dependency existed. I commented out the code as the plot is quite busy and to not detract from the flow of the rest of my analysis. 

#(corr_plot <- ggpairs(data = glacier, columns = c(2:9,22:25,38:39,52:53,66:74)))
#corrplot.mixed(cor1[,2:9,22:25,38:39,52:53,66:74], lower.col = "black" , number.cex = .7)

# Here a number of strong correlations between variables can be seen. Notable is this Summit and Max elevation seem to be the same variable and that particular temperature and precipitation variables correlation with each other. 

Q1. Which are the most important topographical and climatological drivers of glacier equilibrium line altitude (ELA)?

Mixed effects model with random effects

In this first section, I chose to create a Mixed effects model with random effects (lme4 package), where all variables where chosen based on a priori knowledge gained from the readings and the exploratory correlation plot. The methodological advantage of this approach is that through including the random effects, correlations between data coming from specific morphology types and geographical orientations can be accounted for, by treating them as a grouping factor. While this prevents us from explicitly assessing the impacts of morphology and geographic orientation, it still allows us to see how they influence the observed patterns in other variables. In the final section of this analysis (the automated variable selection) morphology and geographic orientation are left as fixed effects, so that I have them once as a fixed and once as a random effect, to allow for comparison between the two scenarios. As the Distribution of ELA had a slight negative skew, I also tried running a GLM with a inverse gaussian distribution (link = “1/mu^2”), but encountered the error that the “PIRLS step-halvings failed to reduce deviance in pwrssUpdate”, to which I did not find a solution.

hist(glacier_scaled$ELA) # there is a negative skew to the data, but overall a gaussian distibution looks appropriate

m_lme4 <- lmer(ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability +                      Slope_mean + Elev_max + Elev_mean + (1|morph_type) + (1|orientation), 
               data = glacier_scaled) 
summary(m_lme4)
## Linear mixed model fit by REML ['lmerMod']
## Formula: 
## ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  
##     T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean +  
##     (1 | morph_type) + (1 | orientation)
##    Data: glacier_scaled
## 
## REML criterion at convergence: 9555.8
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -6.1407 -0.4238  0.0367  0.4765  6.2845 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  orientation (Intercept)  129.2   11.37   
##  morph_type  (Intercept)  468.9   21.65   
##  Residual                4819.7   69.42   
## Number of obs: 849, groups:  orientation, 8; morph_type, 6
## 
## Fixed effects:
##               Estimate Std. Error t value
## (Intercept)   5098.664     12.038 423.553
## debris_cov       2.774     11.248   0.247
## length         -23.132      6.984  -3.312
## area           -21.740      6.526  -3.332
## P_snow          20.095      4.940   4.068
## P_monsoon        8.067      4.476   1.802
## P_not_monsoon  -33.825      7.117  -4.753
## T_mean_mea.yr -101.070     10.953  -9.227
## T_variability   59.366      7.939   7.477
## Slope_mean      -6.581      3.339  -1.971
## Elev_max         8.239      5.691   1.448
## Elev_mean      141.943      7.940  17.876
## 
## Correlation of Fixed Effects:
##             (Intr) dbrs_c length area   P_snow P_mnsn P_nt_m T_mn_. T_vrbl
## debris_cov  -0.096                                                        
## length      -0.059 -0.044                                                 
## area        -0.031 -0.060 -0.722                                          
## P_snow       0.022 -0.004 -0.039  0.069                                   
## P_monsoon   -0.006  0.039 -0.163  0.077  0.016                            
## P_not_monsn -0.026  0.023  0.126 -0.127 -0.713 -0.546                     
## T_mean_m.yr  0.024 -0.081  0.021 -0.085  0.026 -0.047 -0.138              
## T_variablty  0.008 -0.013 -0.058  0.124  0.256  0.115 -0.116 -0.689       
## Slope_mean   0.059  0.005  0.095  0.177  0.066 -0.061  0.025 -0.384  0.177
## Elev_max    -0.012 -0.131 -0.319 -0.191 -0.028  0.010  0.090 -0.092  0.165
## Elev_mean    0.031 -0.016  0.160  0.041 -0.109 -0.094  0.132  0.612 -0.105
##             Slp_mn Elv_mx
## debris_cov               
## length                   
## area                     
## P_snow                   
## P_monsoon                
## P_not_monsn              
## T_mean_m.yr              
## T_variablty              
## Slope_mean               
## Elev_max    -0.477       
## Elev_mean   -0.067 -0.371

The most important fixed effects where mean elevation and mean temperature. A more detailed description and visualization can be found in the dot and whiskar plot at the end of this section. Some variables such a snow and precipitation in non monsoon phases, as well as mean annual temperature and temperature variability snowed some correlation, but as I assess these to have different mechanistic reasons to be included, I decided to not alter my model structure.

# R squared
r.squaredGLMM(m_lme4)
## Warning: 'r.squaredGLMM' now calculates a revised statistic. See the help page.
##            R2m       R2c
## [1,] 0.8996553 0.9107324

Here we mainly need to focus on the The R²(C), which is 90.78 %. The C represents the Conditional R_GLMM² and can be interpreted as a variance explained by the entire model, including both fixed and random effects

# checking out variable co-linearity 
plot(m_lme4)

The residual vs fitted plot shows a mostly flat and even distribution of points, with greater distances seen at the lower spectrum of points, likely due to the negative skew described above

qqnorm(resid(m_lme4))
qqline(resid(m_lme4))

When checking the QQ-plot it is seen that there is a left tail at lower bonds and a right tail at the upper bonds. Ideal would be to use a distribution that better fits the data (such as a inverse gussian distribution), which unfortunately was not possible. Following an ad hoc transformation of the dependent variable ELA, this observed trend only showed a very marginal improvement, so I decided to keep ELA in its un-transformed state.

# random effects
plot_model(m_lme4, type = "re", show.values = TRUE, vline.color = "black")
## [[1]]

## 
## [[2]]

In the summary it can be seen that overall both random effects orientation and morphology type account for relatively large amounts of variation, with Morphology being the stronger predictor (variance 468,9; std dev. 21.65), vs orientation (variance 129.2, std dev. 11.37). Yet in the visualization of the random effect we can see that that for neither orientation or morphology, is are any individual groups with a large outlying effect, as all effect sizes ± standard errors either are very close or cross 0.

ggcoefstats(x = m_lme4, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

In the dot-whisker visual representation of the effect sizes and standard errors of our the variables we can clearly see that mean elevation and mean annual temperature are the two most influential variables at predicting ELA. These are then followed by precipitation in non-monsoon phases, as well as the temperature variability variable that I aggregated from mean temperature data.

Mixed effects model with random effects dELA

hist(glacier_scaled$dELA)

There is a stronger skew to the data. While again an inverse Gaussian distribution seems appropriate, this was also not possible. Instead I compared a the model with a normal Gaussian and a Poisson distribution and found that the Poisson had a better distribution of residuals.

m_lme4 <-lmer(ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability +                      Slope_mean + Elev_max + Elev_mean + (1 | morph_type) + (1 | orientation),
    data = glacier_scaled)

m_lme4_d <-glmer(dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability +  Slope_mean + Elev_max + Elev_mean + (1 | morph_type) + (1 | orientation),
    family = poisson,
    data = glacier_scaled)
summary(m_lme4_d)
## Generalized linear mixed model fit by maximum likelihood (Laplace
##   Approximation) [glmerMod]
##  Family: poisson  ( log )
## Formula: 
## dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  
##     T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean +  
##     (1 | morph_type) + (1 | orientation)
##    Data: glacier_scaled
## 
##      AIC      BIC   logLik deviance df.resid 
##  26667.4  26733.8 -13319.7  26639.4      835 
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -9.4921 -3.6701 -0.7578  2.7556 30.8456 
## 
## Random effects:
##  Groups      Name        Variance Std.Dev.
##  orientation (Intercept) 0.01299  0.1140  
##  morph_type  (Intercept) 0.67998  0.8246  
## Number of obs: 849, groups:  orientation, 8; morph_type, 6
## 
## Fixed effects:
##                Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    4.297871   0.342767  12.539  < 2e-16 ***
## debris_cov    -0.287961   0.016273 -17.696  < 2e-16 ***
## length         0.022230   0.009986   2.226 0.026009 *  
## area          -0.004885   0.009229  -0.529 0.596575    
## P_snow         0.034998   0.007133   4.907 9.27e-07 ***
## P_monsoon      0.047429   0.006502   7.294 3.00e-13 ***
## P_not_monsoon -0.104107   0.010147 -10.260  < 2e-16 ***
## T_mean_mea.yr  0.230208   0.014964  15.384  < 2e-16 ***
## T_variability -0.022877   0.010742  -2.130 0.033192 *  
## Slope_mean     0.091553   0.004734  19.339  < 2e-16 ***
## Elev_max       0.059419   0.007670   7.747 9.41e-15 ***
## Elev_mean      0.038329   0.010282   3.728 0.000193 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) dbrs_c length area   P_snow P_mnsn P_nt_m T_mn_. T_vrbl
## debris_cov  -0.002                                                        
## length      -0.007 -0.040                                                 
## area        -0.002 -0.071 -0.706                                          
## P_snow       0.004 -0.015 -0.039  0.075                                   
## P_monsoon    0.002  0.052 -0.184  0.067  0.010                            
## P_not_monsn -0.005  0.028  0.151 -0.135 -0.718 -0.552                     
## T_mean_m.yr  0.001 -0.093 -0.020 -0.080  0.011 -0.036 -0.105              
## T_variablty  0.000 -0.017 -0.047  0.125  0.254  0.168 -0.175 -0.736       
## Slope_mean   0.003  0.005  0.119  0.167  0.078 -0.101  0.023 -0.419  0.215
## Elev_max    -0.001 -0.105 -0.322 -0.194 -0.034  0.047  0.063 -0.049  0.129
## Elev_mean    0.000 -0.052  0.134  0.040 -0.149 -0.039  0.145  0.605 -0.181
##             Slp_mn Elv_mx
## debris_cov               
## length                   
## area                     
## P_snow                   
## P_monsoon                
## P_not_monsn              
## T_mean_m.yr              
## T_variablty              
## Slope_mean               
## Elev_max    -0.482       
## Elev_mean   -0.097 -0.359

In the summary it can be seen that neither random effects orientation and morphology type have significant influence on ELA, with Morphology as both have relatively small effect sizes with standard errors that make the estimate cross 0. The most important fixed effects where mean temperature and debris cover. A more detailed description and visualization can be found in the dot and whisker plot at the end of this section. Some variables such a snow and precipitation in non monsoon phases, as well as mean annual temperature and temperature variability snowed some correlation, but as I assess these to have different mechanistic reasons to be included, I decided to not alter my model structure.

# R squared (conditional)
r.squaredGLMM(m_lme4_d)
## Warning: The null model is correct only if all variables used by the original
## model remain unchanged.
##                  R2m       R2c
## delta     0.04899643 0.9864697
## lognormal 0.04899972 0.9865360
## trigamma  0.04899310 0.9864028

The R²(C) is 98.65 %, where the C represents the Conditional R_GLMM² and can be interpreted as a variance explained by the entire model, including both fixed and random effects

# checking out variable co-linearity 
plot(m_lme4_d)

The residual vs fitted plot shows a mostly flat and even distribution of points, with some outlying point at intermediate values

qqnorm(resid(m_lme4_d))

When checking the QQ-plot it is seen that there is slight tail at the upper bound, but overall this is a significant improvement over the Gaussian distribution.

# random effects 
plot_model(m_lme4_d, type = "re", show.values = TRUE, vline.color = "black")
## [[1]]

## 
## [[2]]

In the summary it can be seen that both random effects orientation and morphology type relatively account for a small amount of overall variation, with Morphology being the stronger predictor (variance 0.68; std dev. 0.82), vs orientation (variance 0.012, std dev. 0.11). Note that here the random effects are not centered around 0 but 1, as I used a Possion distribution for this model. In the visualization of the random effect we can see that that for orientation there are no groups with a large outlying effect, as all effect sizes ± standard errors either are very close or cross 0. On the other hand, for morphology type we can see that class 6 has a uncharacteristically strong negative influence on dELA, indicating that it may be a driver of glacier loss.

# Variable effect size visualization
ggcoefstats(x = m_lme4_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F) 

In the dot-whiskar visual representation of the effect sizes and standard errors of our the variables we can clearly see that as before (normal ELA) mean annual temperature is an highly influential variable at predicting ELA. This is followed by followed by the previously identified variable of precipitation in non-monsoon phases, as well as the new debris cover variable and mean elevation.

Bayesian Models

MCMCglmm is an altnerative modeling package to Rstan (brms), that I selected to use as I am more family with its practical implementation and as I have been unable to install stan successfully in the past. Both packages use a Bayesian statistical framework in combination with Markov chain Monte Carlo simulations. The default prior values assumed in a MCMCglmm are posterior distribution with very large variance values for the fixed effects and a flat (weakly informative) prior. The variances for the random effects are assumed to have inverse-Wishart priors with a very low value for nu (weakly informative). I decided to not delve deeper into assigning my own priors, due to me having i) little prior knowledge the dynamics / expected values and distributions of variables and ii) relatively shallow knowledge of the mathematical underpinnings and practical implementation methods for assigning more informative priors.

m_bays <- MCMCglmm(ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability +                      Slope_mean + Elev_max + Elev_mean, random = ~ morph_type + orientation, 
               data = glacier_scaled) 
## Warning: Unknown or uninitialised column: `family`.
## Warning: Setting row names on a tibble is deprecated.
## 
##                        MCMC iteration = 0
## 
##                        MCMC iteration = 1000
## 
##                        MCMC iteration = 2000
## 
##                        MCMC iteration = 3000
## 
##                        MCMC iteration = 4000
## 
##                        MCMC iteration = 5000
## 
##                        MCMC iteration = 6000
## 
##                        MCMC iteration = 7000
## 
##                        MCMC iteration = 8000
## 
##                        MCMC iteration = 9000
## 
##                        MCMC iteration = 10000
## 
##                        MCMC iteration = 11000
## 
##                        MCMC iteration = 12000
## 
##                        MCMC iteration = 13000
# The model runs and converges after the default 13000 iterations

summary(m_bays)
## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 9644.451 
## 
##  G-structure:  ~morph_type
## 
##            post.mean  l-95% CI u-95% CI eff.samp
## morph_type     661.8 1.937e-10     2409    584.9
## 
##                ~orientation
## 
##             post.mean  l-95% CI u-95% CI eff.samp
## orientation     85.99 1.063e-16    344.2    310.4
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units      4896     4432     5370    502.8
## 
##  Location effects: ELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon + T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean 
## 
##               post.mean l-95% CI u-95% CI eff.samp  pMCMC    
## (Intercept)    5099.632 5073.123 5123.610    804.6 <0.001 ***
## debris_cov        5.187  -18.061   25.510   1000.0  0.646    
## length          -24.554  -37.676  -10.694    894.5 <0.001 ***
## area            -22.418  -35.541  -11.543   1000.0 <0.001 ***
## P_snow           19.550    9.746   28.685   1000.0 <0.001 ***
## P_monsoon         9.726   -0.658   18.180    280.0  0.050 .  
## P_not_monsoon   -33.734  -47.420  -19.942    898.6 <0.001 ***
## T_mean_mea.yr   -95.052 -117.068  -69.611    152.8 <0.001 ***
## T_variability    56.454   41.028   72.149    100.9 <0.001 ***
## Slope_mean       -7.296  -13.988   -0.882   1000.0  0.028 *  
## Elev_max          9.233   -1.399   20.548   1000.0  0.112    
## Elev_mean       145.283  127.854  161.175    176.5 <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As before in the hierarchical GLM, it is seen that both morphology and orientation account for some variance in ELA. From the postierier mean (visualization in following code chunk), we can see that that like in the hierarchical GLM, mean elevation and mean annual temperature are the strongest predictors of ELA (as its postierier mean in centered farthest from zero). The aggregated variable temperature variability was the also a good predictor of ELA.

# R squared
#r.squaredGLMM(m_bays)

Unfortunately I was not able to figure out access using the r2 with a predefined function, or an official way to compute it. Therefore For me Bayesian models I do not include r2 values.

# postirior distribution of random effects
par(mfrow = c(1,2))
hist(mcmc(m_bays$VCV)[,"morph_type"])
hist(mcmc(m_bays$VCV)[,"orientation"])

The variance can not be equal to zero, but as the mean value is up against zero, this can be interpreted as the effect not being of great significance (which corresponds with the results of the hierarchical linear model). As the spread of the histograms are relatively narrow, it can be assumed that the distribution is relatively accurate.

# Assesing model convergence through plotting trace and density est. for the intercept
plot(m_bays$Sol)

The trace can be interpreted as the a pseudo time-series of what the model was doing in each of its iterations. As all traces have the “fuzzy caterpillar” appearance, it can be assumed there is adequate mixing with overall convergence being met. The density plot shows the postierier distribution of each the model predicted for each variable in each iteration. If the distribution crosses zero, the variable can be assumed to be non-significant and the spread of the distribution conveys the overall accuracy of the predicted variable, as well as the probability of effect size being “true”.

# Variable effect size visualization
ggcoefstats(x = m_bays, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", robust = TRUE, exclude.intercept = T)
## Warning: Can't extract 'deviance' residuals. Returning response residuals.

In the dot-whisker visual representation of the effect sizes and standard errors of our the variables we can clearly see that as before (Hierarchical GLM), the mean elevation and mean annual temperature are highly influential variables at predicting ELA. This is followed by followed by the also previously identified variables of temperature variability and precipitation in non-monsoon phases. The correspondence between the hierarchical GLM and Bayesian model in terms of the magnitude of how variables influence ELA makes intuitive sense, as largely the models largely share the same hierarchical structure and only slightly deviate in how effect sizes and standard errors are calculated. The lack of major differences between the two models is also accentuated by the fact the only weak (non-informative) priors were selected for the Bayesian model.

Bayesian dELA

m_bays_d <- MCMCglmm(dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon +  T_mean_mea.yr +  T_variability + Slope_mean + Elev_max + Elev_mean, random = ~ morph_type + orientation, 
                       family = "poisson",
                     data = glacier_scaled) 
## 
##                        MCMC iteration = 0
## 
##  Acceptance ratio for liability set 1 = 0.000220
## 
##                        MCMC iteration = 1000
## 
##  Acceptance ratio for liability set 1 = 0.441896
## 
##                        MCMC iteration = 2000
## 
##  Acceptance ratio for liability set 1 = 0.455330
## 
##                        MCMC iteration = 3000
## 
##  Acceptance ratio for liability set 1 = 0.466432
## 
##                        MCMC iteration = 4000
## 
##  Acceptance ratio for liability set 1 = 0.509494
## 
##                        MCMC iteration = 5000
## 
##  Acceptance ratio for liability set 1 = 0.509485
## 
##                        MCMC iteration = 6000
## 
##  Acceptance ratio for liability set 1 = 0.509521
## 
##                        MCMC iteration = 7000
## 
##  Acceptance ratio for liability set 1 = 0.509187
## 
##                        MCMC iteration = 8000
## 
##  Acceptance ratio for liability set 1 = 0.510649
## 
##                        MCMC iteration = 9000
## 
##  Acceptance ratio for liability set 1 = 0.509330
## 
##                        MCMC iteration = 10000
## 
##  Acceptance ratio for liability set 1 = 0.510311
## 
##                        MCMC iteration = 11000
## 
##  Acceptance ratio for liability set 1 = 0.510081
## 
##                        MCMC iteration = 12000
## 
##  Acceptance ratio for liability set 1 = 0.510303
## 
##                        MCMC iteration = 13000
## 
##  Acceptance ratio for liability set 1 = 0.510100
# The model runs and converges after the default 13000 iterations

summary(m_bays_d)
## 
##  Iterations = 3001:12991
##  Thinning interval  = 10
##  Sample size  = 1000 
## 
##  DIC: 7017.478 
## 
##  G-structure:  ~morph_type
## 
##            post.mean  l-95% CI u-95% CI eff.samp
## morph_type  0.002907 3.561e-17 0.001499     1000
## 
##                ~orientation
## 
##             post.mean  l-95% CI u-95% CI eff.samp
## orientation  0.007013 1.699e-16  0.02354    309.8
## 
##  R-structure:  ~units
## 
##       post.mean l-95% CI u-95% CI eff.samp
## units     0.292   0.2641   0.3234    847.5
## 
##  Location effects: dELA ~ debris_cov + length + area + P_snow + P_monsoon + P_not_monsoon + T_mean_mea.yr + T_variability + Slope_mean + Elev_max + Elev_mean 
## 
##               post.mean  l-95% CI  u-95% CI eff.samp  pMCMC    
## (Intercept)    4.471700  4.396902  4.551156   1000.0 <0.001 ***
## debris_cov    -0.194139 -0.360286 -0.033519   1000.0  0.024 *  
## length         0.018940 -0.091080  0.124414   1000.0  0.672    
## area          -0.057611 -0.171369  0.032298   1000.0  0.254    
## P_snow         0.029123 -0.041720  0.114626   1000.0  0.462    
## P_monsoon      0.069296  0.002700  0.149383    677.9  0.058 .  
## P_not_monsoon -0.125731 -0.240085 -0.013529    836.9  0.034 *  
## T_mean_mea.yr  0.382773  0.207533  0.572074    428.3 <0.001 ***
## T_variability -0.078725 -0.213707  0.044303    699.7  0.202    
## Slope_mean     0.077949  0.021030  0.129824   1000.0  0.002 ** 
## Elev_max       0.070856 -0.007976  0.162023    901.7  0.104    
## Elev_mean      0.127486  0.003874  0.245267    586.0  0.032 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As before in the hierarchical GLM, it is seen that both morphology and orientation account for some variance in ELA. From the postierier mean (visualization in following code chunk), we can see that that like in the hierarchical GLM, mean annual temperature is the strongest predictor of ELA (as its postierier mean in centered farthest from zero). Additionally mean elevation, debris cover, and precipitation during non monsoon periods are also seen to be influential variables. A more detailed description can be found at the end of the section in the dot whiskar plot.

# R squared
#r.squaredGLMM(m_bays_d)

Unfortunately I was not able to figure out access using the r2 with a predefined function, or an official way to compute it. Therefore For me Bayesian models I do not include r2 values.

# posterior distribution of random effects
par(mfrow = c(1,2))
hist(mcmc(m_bays_d$VCV)[,"morph_type"])
hist(mcmc(m_bays_d$VCV)[,"orientation"])

The variance can not be equal to zero, but as the mean value is up against zero, this can be interpreted as the effect not being of great significance. As the spread of the histograms are relatively narrow, it can be assumed that the distribution is relatively accurate.

# Assessing model convergence through plotting trace and density est. for the intercept
plot(m_bays_d$Sol)

The trace can be interpreted as the a pseudo time-series of what the model was doing in each of its iterations. As all traces have the fuzzy caterpillar appearance, it can be assumed there is adequate mixing with overall convergence being met. The density plot shows the postierer distribution of each the model predicted for each variable in each iteration. If the distribution crosses zero, the variable can be assumed to be non-significant and the spread of the distribution conveys the overall accuracy of the predicted variable, as well as the probability of effect size being “true”. .

# Variable effect size visualization
ggcoefstats(x = m_bays_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", robust = TRUE, exclude.intercept = T)
## Warning: Can't extract 'deviance' residuals. Returning response residuals.

In the dot-whisker visual representation of the effect sizes and standard errors of our the variables we can clearly see that as before (Hierarchical GLM), the mean annual temperature and debris cover are highly influential variables at predicting dELA. This is followed by followed by the also previously identified variable of precipitation in non-monsoon phases. The correspondence between the hierarchical GLM and Bayesian model in terms of the magnitude of how variables influence dELA makes intuitive sense, as largely the models largely share the same hierarchical structure and only slightly deviate in how effect sizes and standard errors are calculated. The lack of major differences between the two models is also accentuated by the fact the only weak (non-informative) priors were selected for the Bayesian model.

PCA

In an exploratory correlation plot I found that many variables displayed high levels of collinearity/multicollinearity. Therefore, I decided to create a model using the principal components from a PCA, which has the advantage of being able to preserve all available variables, while not violating any assumptions of variable collinearity and co-dependence. Using the rotation function to assess how strongly each variable loaded on to the computed PC, I created a characterization of the which variables each PC was analogous to. I selection critireon for the number of PC included in the final model was a visual assessment of the cumulative sum of explained variation, relying on the “elbow-method”.

# pca # need to redo analysis/description of pca plots
pca <- glacier_scaled %>% 
  dplyr::select(c(4:9,22:25,38:39,52:53,66:75), -orientation) # need to remove as it is a character
str(pca)
## tibble[,23] [849 × 23] (S3: tbl_df/tbl/data.frame)
##  $ summit                : num [1:849] 5710 5750 5715 5760 5675 ...
##  $ debris_cov            : num [1:849] 0 0 0 0 0 0 0 0 0 0 ...
##  $ morph_type            : num [1:849] 2 2 2 3 2 2 2 2 2 2 ...
##  $ length                : num [1:849, 1] -0.386 -0.284 -0.548 -0.579 0.295 ...
##   ..- attr(*, "scaled:center")= num 6216
##   ..- attr(*, "scaled:scale")= num 5539
##  $ area                  : num [1:849, 1] -0.3439 -0.2619 -0.4021 -0.4901 0.0947 ...
##   ..- attr(*, "scaled:center")= num 1393269
##   ..- attr(*, "scaled:scale")= num 2117729
##  $ P_snow                : num [1:849, 1] -0.5453 -0.2486 -0.0552 0.9086 0.1799 ...
##   ..- attr(*, "scaled:center")= num 120
##   ..- attr(*, "scaled:scale")= num 50.7
##  $ P_year                : num [1:849, 1] -1.211 -0.811 -0.735 -0.782 -0.027 ...
##   ..- attr(*, "scaled:center")= num 889
##   ..- attr(*, "scaled:scale")= num 177
##  $ P_monsoon             : num [1:849, 1] -1.121 -0.593 -0.599 -0.564 0.481 ...
##   ..- attr(*, "scaled:center")= num 638
##   ..- attr(*, "scaled:scale")= num 94.4
##  $ P_not_monsoon         : num [1:849, 1] -1.162 -0.936 -0.788 -0.912 -0.538 ...
##   ..- attr(*, "scaled:center")= num 251
##   ..- attr(*, "scaled:scale")= num 93.3
##  $ T_MIN_mean_monsoon    : num [1:849, 1] -1.18 -1.59 -1.41 -1.72 -1.43 ...
##   ..- attr(*, "scaled:center")= num 1.13
##   ..- attr(*, "scaled:scale")= num 1.08
##  $ T_MIN_mean_not_monsoon: num [1:849, 1] -1.24 -1.61 -1.43 -1.7 -1.47 ...
##   ..- attr(*, "scaled:center")= num -11.9
##   ..- attr(*, "scaled:scale")= num 1.3
##  $ T_MAX_mean_monsoon    : num [1:849, 1] -1.08 -1.59 -1.38 -1.75 -1.36 ...
##   ..- attr(*, "scaled:center")= num 7.81
##   ..- attr(*, "scaled:scale")= num 0.892
##  $ T_MAX_mean_not_monsoon: num [1:849, 1] -1.16 -1.62 -1.41 -1.76 -1.46 ...
##   ..- attr(*, "scaled:center")= num -2.75
##   ..- attr(*, "scaled:scale")= num 1.04
##  $ T_mean_mea.yr         : num [1:849, 1] -1.18 -1.61 -1.42 -1.74 -1.45 ...
##   ..- attr(*, "scaled:center")= num -2.4
##   ..- attr(*, "scaled:scale")= num 1.08
##  $ T_mean_monsoon        : num [1:849, 1] -1.13 -1.6 -1.41 -1.74 -1.4 ...
##   ..- attr(*, "scaled:center")= num 4.47
##   ..- attr(*, "scaled:scale")= num 0.983
##  $ T_mean_not_monsoon    : num [1:849, 1] -1.21 -1.61 -1.43 -1.74 -1.47 ...
##   ..- attr(*, "scaled:center")= num -7.31
##   ..- attr(*, "scaled:scale")= num 1.16
##  $ Slope_min             : num [1:849, 1] 0.69002 -0.16035 0.41288 0.00442 -0.79263 ...
##   ..- attr(*, "scaled:center")= num 8.45
##   ..- attr(*, "scaled:scale")= num 6.45
##  $ Slope_max             : num [1:849, 1] -0.138 -0.748 -0.6 -0.855 -1.09 ...
##   ..- attr(*, "scaled:center")= num 46.2
##   ..- attr(*, "scaled:scale")= num 10.4
##  $ Slope_mean            : num [1:849, 1] -0.359 -0.925 -0.342 -0.431 -1.285 ...
##   ..- attr(*, "scaled:center")= num 27.5
##   ..- attr(*, "scaled:scale")= num 6.44
##  $ Elev_min              : num [1:849, 1] 0.837 0.901 1.071 1.336 0.896 ...
##   ..- attr(*, "scaled:center")= num 4800
##   ..- attr(*, "scaled:scale")= num 407
##  $ Elev_max              : num [1:849, 1] 0.641 0.86 0.699 0.991 0.548 ...
##   ..- attr(*, "scaled:center")= num 5505
##   ..- attr(*, "scaled:scale")= num 292
##  $ Elev_mean             : num [1:849, 1] 0.994 1.377 1.464 1.71 1.191 ...
##   ..- attr(*, "scaled:center")= num 5155
##   ..- attr(*, "scaled:scale")= num 221
##  $ T_variability         : num [1:849, 1] -0.929 -1.527 -1.286 -1.73 -1.291 ...
##   ..- attr(*, "scaled:center")= num 10.4
##   ..- attr(*, "scaled:scale")= num 0.782
##  - attr(*, "na.action")= 'exclude' Named int [1:4] 731 739 793 829
##   ..- attr(*, "names")= chr [1:4] "731" "739" "793" "829"
pca <- prcomp(pca, scale. = T)
summary(pca)
## Importance of components:
##                          PC1    PC2    PC3     PC4     PC5     PC6     PC7
## Standard deviation     3.239 2.1197 1.7267 1.26260 1.03151 0.82279 0.68121
## Proportion of Variance 0.456 0.1954 0.1296 0.06931 0.04626 0.02943 0.02018
## Cumulative Proportion  0.456 0.6514 0.7810 0.85032 0.89658 0.92601 0.94619
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.57066 0.52564 0.41415 0.37528 0.32194 0.28148 0.25031
## Proportion of Variance 0.01416 0.01201 0.00746 0.00612 0.00451 0.00344 0.00272
## Cumulative Proportion  0.96035 0.97236 0.97982 0.98594 0.99045 0.99389 0.99662
##                           PC15    PC16    PC17    PC18     PC19     PC20
## Standard deviation     0.20272 0.15590 0.11041 0.01307 0.006805 0.004639
## Proportion of Variance 0.00179 0.00106 0.00053 0.00001 0.000000 0.000000
## Cumulative Proportion  0.99840 0.99946 0.99999 1.00000 1.000000 1.000000
##                            PC21      PC22      PC23
## Standard deviation     9.62e-08 9.493e-16 5.442e-16
## Proportion of Variance 0.00e+00 0.000e+00 0.000e+00
## Cumulative Proportion  1.00e+00 1.000e+00 1.000e+00
pca$rotation[,1:6]
##                                  PC1          PC2          PC3           PC4
## summit                 -0.1156595114 -0.396530022  0.006722996 -0.1575511536
## debris_cov              0.0592119096 -0.269462680 -0.052365271  0.0841236946
## morph_type              0.0400162434 -0.091990076 -0.029584626  0.0998727770
## length                  0.0483474510 -0.419110439  0.066102380  0.2009610901
## area                    0.0498102447 -0.403993717  0.069942759  0.2063360677
## P_snow                  0.0092334579  0.044215327  0.512080560 -0.1067217199
## P_year                  0.1663450945  0.026036585  0.473270085 -0.0983638200
## P_monsoon               0.1149693609 -0.015678340  0.482582772 -0.1269818451
## P_not_monsoon           0.1989915206  0.065216422  0.408839901 -0.0579761041
## T_MIN_mean_monsoon      0.3060541207  0.012778292 -0.053013928  0.0008543518
## T_MIN_mean_not_monsoon  0.3032010943 -0.016791648 -0.044913670 -0.0236876350
## T_MAX_mean_monsoon      0.2981733398  0.062127235 -0.096162241  0.0460564355
## T_MAX_mean_not_monsoon  0.3038968050  0.008468283 -0.074287604  0.0015814723
## T_mean_mea.yr           0.3054852777  0.009768718 -0.064004869  0.0001986579
## T_mean_monsoon          0.3041721614  0.035272945 -0.072790921  0.0211532886
## T_mean_not_monsoon      0.3047792037 -0.005670976 -0.058388518 -0.0124484181
## Slope_min              -0.0008673191  0.172349229 -0.144086417 -0.5497905945
## Slope_max               0.1074204433 -0.340938679 -0.052861818 -0.2748845202
## Slope_mean              0.0969690782 -0.129444291 -0.151594973 -0.6427029063
## Elev_min               -0.2245285603  0.284072988  0.023014192 -0.0242427371
## Elev_max               -0.1272311906 -0.398418302  0.024459017 -0.1469850576
## Elev_mean              -0.2903997157 -0.014890301  0.052015325 -0.1132050921
## T_variability           0.2804581157  0.087127203 -0.134391044  0.0736330210
##                                  PC5          PC6
## summit                  0.0824178265 -0.013314116
## debris_cov             -0.1631885443 -0.921362046
## morph_type             -0.8875920506  0.231071412
## length                 -0.0376147737  0.144502015
## area                   -0.1262642924  0.122834120
## P_snow                 -0.1045303257 -0.040817253
## P_year                 -0.0006254728 -0.028488009
## P_monsoon               0.0538899676 -0.034703229
## P_not_monsoon          -0.0557100478 -0.018888474
## T_MIN_mean_monsoon      0.0092062920 -0.004116058
## T_MIN_mean_not_monsoon  0.0024765380  0.003884554
## T_MAX_mean_monsoon      0.0178121599 -0.010498486
## T_MAX_mean_not_monsoon  0.0277241682 -0.001477082
## T_mean_mea.yr           0.0136610122 -0.001637198
## T_mean_monsoon          0.0131736061 -0.006805675
## T_mean_not_monsoon      0.0138881375  0.001490011
## Slope_min              -0.3032269288 -0.114844999
## Slope_max               0.1929150277  0.181647499
## Slope_mean             -0.0472596556  0.047772626
## Elev_min               -0.0550247898 -0.073168174
## Elev_max                0.0741599741  0.021785456
## Elev_mean              -0.0209235332 -0.053078593
## T_variability           0.0487193561 -0.014295989

Based on this rotation, I will make the following assumption for which variables are analogs to each PC PC1: All temperature variables (both monsoon & non-monsoon), temperature variability, and elevation mean PC2: Geophysical conditions such as min/max elevation, debris cover, catchment length and area, and max slope PC3: All precipitation variables PC4: Slope, and to a far lesser degree catchment area, length and elevation PC5: Morphology type

var_exp <- data.frame(pc = c(1:23),
                      var_exp = pca$sdev^2 / sum(pca$sdev^2))

# add the variances cumulatively
var_exp$var_exp_cumsum <- cumsum(var_exp$var_exp)
var_exp
##    pc      var_exp var_exp_cumsum
## 1   1 4.560124e-01      0.4560124
## 2   2 1.953597e-01      0.6513721
## 3   3 1.296334e-01      0.7810055
## 4   4 6.931112e-02      0.8503167
## 5   5 4.626144e-02      0.8965781
## 6   6 2.943396e-02      0.9260121
## 7   7 2.017606e-02      0.9461881
## 8   8 1.415870e-02      0.9603468
## 9   9 1.201310e-02      0.9723599
## 10 10 7.457546e-03      0.9798175
## 11 11 6.123134e-03      0.9859406
## 12 12 4.506432e-03      0.9904470
## 13 13 3.444813e-03      0.9938918
## 14 14 2.724236e-03      0.9966161
## 15 15 1.786818e-03      0.9984029
## 16 16 1.056764e-03      0.9994597
## 17 17 5.299700e-04      0.9999896
## 18 18 7.426372e-06      0.9999971
## 19 19 2.013656e-06      0.9999991
## 20 20 9.358421e-07      1.0000000
## 21 21 4.023442e-16      1.0000000
## 22 22 3.917793e-32      1.0000000
## 23 23 1.287748e-32      1.0000000
ggplot(var_exp) +
  geom_bar(aes(x = pc, y = var_exp), stat = "identity") +
  geom_line(aes(x = pc, y = var_exp_cumsum)) +
  theme_classic()

Following the Elbow method, it seems like 3-5 PCs seems like an appropriate cutoff point. Despite being more difficult to interpret the final results, I decided to include 5 PCs, due to PCs 4 and 5 increasing the overall variance explained from ~80% to 90%. This also helps to ensure that the large number of variables in the model are captured in the PCs, especially as morphology type almost exclusively loads on PC5.

# scores
pca_scores <- data.frame(pca$x)
head(pca_scores)
##         PC1         PC2         PC3        PC4        PC5         PC6
## 1 -4.085950  0.12736786 -1.15488972 -0.4276191  0.8461927 -0.19220332
## 2 -5.283278 -0.02000121  0.10140922  0.2929743  0.9265436 -0.25750181
## 3 -4.742488  0.34258840 -0.03086408 -0.5009767  0.7503837 -0.36280031
## 4 -5.739056  0.13114428  0.67963939 -0.2685366 -0.6152369 -0.08876674
## 5 -4.418274 -0.03063813  1.46720119  1.0095502  0.9809866 -0.22197371
## 6 -3.354310  0.37954311  0.86186526  0.4488272  1.1837522 -0.16445739
##          PC7        PC8         PC9         PC10        PC11         PC12
## 1 -0.5924383  0.3157648 -0.06914327  0.040799740 -0.33150857  0.003904768
## 2 -0.4260974 -0.0194212  0.07569821 -0.284495967 -0.08137097 -0.125315391
## 3 -0.4281015  0.1150437  0.10357952  0.006878345  0.01024527 -0.130232249
## 4  0.3062982  0.4004047  0.59410478 -0.129178425  0.05062106  0.089638527
## 5 -0.4874418 -0.4850412 -0.13692691 -0.028121838  0.24060972 -0.082023890
## 6  0.2822511 -0.2910021 -0.38920635 -0.053532077  0.18451985 -0.127911367
##          PC13         PC14        PC15        PC16        PC17        PC18
## 1 -0.06771404 -0.007749453  0.03161404 -0.08157295 -0.02674952 -0.02366540
## 2 -0.12582001  0.104398650  0.04780135 -0.07585605 -0.01790055 -0.01942436
## 3 -0.16104792  0.015764018  0.06123426 -0.03189586 -0.02538805 -0.01376238
## 4  0.16573447 -0.050168883  0.09239933 -0.16075389 -0.01365135 -0.02749751
## 5  0.19389144  0.139088085 -0.01502534 -0.08903185  0.07761670 -0.01844080
## 6  0.31287542 -0.240958925 -0.19863270  0.03489515  0.11186253 -0.01939790
##           PC19          PC20          PC21          PC22          PC23
## 1  0.008769504  0.0026651323 -2.861286e-08 -5.551115e-17  1.387779e-16
## 2  0.001274638  0.0082995976  7.317158e-08 -2.220446e-16  2.220446e-16
## 3 -0.002945277  0.0004759622 -4.582566e-08  2.220446e-16  3.608225e-16
## 4  0.001010093 -0.0047553419  2.704832e-07 -1.110223e-16  3.053113e-16
## 5  0.003278927 -0.0033848377  1.013831e-07 -6.661338e-16 -1.249001e-16
## 6  0.003915545 -0.0039411928 -2.274929e-07 -1.110223e-16 -8.326673e-17
pca_scores_df <- cbind(glacier_scaled, pca_scores)
#ggpairs(pca_scores_df[,c(4:6,8:9,22:25,38:39,52:53,66:75,76:80)]) # first 5 Pcs

m_pca <- glm(ELA ~ PC1 + PC2 + PC3 + PC4 + PC5, data = pca_scores_df)
summary(m_pca)
## 
## Call:
## glm(formula = ELA ~ PC1 + PC2 + PC3 + PC4 + PC5, data = pca_scores_df)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -365.98   -41.14    -3.62    41.31   562.01  
## 
## Coefficients:
##              Estimate Std. Error  t value Pr(>|t|)    
## (Intercept) 5104.9305     2.8600 1784.941  < 2e-16 ***
## PC1          -65.1737     0.8836  -73.757  < 2e-16 ***
## PC2           17.7810     1.3500   13.171  < 2e-16 ***
## PC3            4.7192     1.6573    2.848  0.00451 ** 
## PC4          -24.0376     2.2665  -10.606  < 2e-16 ***
## PC5           -4.2597     2.7743   -1.535  0.12506    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 6944.475)
## 
##     Null deviance: 45691145  on 848  degrees of freedom
## Residual deviance:  5854193  on 843  degrees of freedom
## AIC: 9927.3
## 
## Number of Fisher Scoring iterations: 2

Here we see that PCs 1,2, and 4 have the strongest effect size and large predictive power for ELA. Based on the characterization of which variables are represented by each PC, this means that temperature, geophysical conditions, and slope (and to a smaller extend catchment length, area, and elevation) are the most important variables. Out of the three PCs, PC1 (temperature variables), are by far the most important with a effect size of -65.17 (Std. error 0.88). PCs 2 (geophysical conditions) and 4 (slope) have effect sizes of 17.78 (Std. error 1.35) and -24.04 (Std. error 2.27) respectively.

# R squared
r.squaredGLMM(m_pca)
##            R2m       R2c
## [1,] 0.8712126 0.8712126
# The R² is 87.12 %
plot(m_pca)

The residual vs fitted plot shows a mostly flat and even distribution of points, with some outlying point at intermediate values. The QQ-plot has skewed tails at both the lower and upper bounds and is substantial for larger values

# Variable effect size visualization
ggcoefstats(x = m_pca, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F)

In the dot-whisker visual representation of the effect sizes and standard errors of our the variables we can clearly see that PCs 1,2, and 4 have the strongest effect size and large predictive power for ELA. Based on the characterization of which variables are represented by each PC, this means that temperature, geophysical conditions, and slope (and to a smaller extend catchment length, area, and elevation) are the most important variables.

PCA dELA model

m_pca_d <- glm(dELA ~ PC1 + PC2 + PC3 + PC4 + PC5,  family = poisson, data = pca_scores_df)
summary(m_pca_d)
## 
## Call:
## glm(formula = dELA ~ PC1 + PC2 + PC3 + PC4 + PC5, family = poisson, 
##     data = pca_scores_df)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -12.3519   -4.1720   -0.8174    2.7615   26.3817  
## 
## Coefficients:
##               Estimate Std. Error  z value Pr(>|z|)    
## (Intercept)  4.5928575  0.0034796 1319.926   <2e-16 ***
## PC1          0.0344310  0.0010108   34.063   <2e-16 ***
## PC2         -0.0290177  0.0015856  -18.301   <2e-16 ***
## PC3          0.0005099  0.0018754    0.272    0.786    
## PC4         -0.0952305  0.0025894  -36.777   <2e-16 ***
## PC5          0.0391401  0.0032913   11.892   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26056  on 848  degrees of freedom
## Residual deviance: 23131  on 843  degrees of freedom
## AIC: 28476
## 
## Number of Fisher Scoring iterations: 5

Here we see that PCs 4 and 5 have the strongest effect size and large predictive power for dELA. Based on the characterization of which variables are represented by each PC, this means that slope (and to a smaller extend catchment length, area, and elevation), as well as morphological conditions are the most important variables for predicting dELA. Out of the two PCs, PC4 (slope+), is slightly more important with a effect size of -0.095 (Std. error 0.003). PCs 5 (morphology) has an effect size of 0.039 (Std. error 0.0032).

# R squared
r.squaredGLMM(m_pca_d)
## Warning: The null model is correct only if all variables used by the original
## model remain unchanged.
##                 R2m       R2c
## delta     0.7644043 0.7644043
## lognormal 0.7652960 0.7652960
## trigamma  0.7635058 0.7635058
# The R² is 76.44 %
plot(m_pca_d)

The residual vs fitted plot shows a mostly flat and even distribution of points, with some outlying point at intermediate values

# Variable effect size visualization
ggcoefstats(x = m_pca_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F)

In the dot-whisker visual representation of the effect sizes and standard errors of our the variables we can see that PCs 4 and 5 have the strongest effect size and large predictive power for dELA. Based on the characterization of which variables are represented by each PC, this means that slope (and to a smaller extend catchment length, area, and elevation) and morphology are the most important variables.

Automated variable selection

For the automated variable variable I focused on variables that I previously found to be strong predictors of ELA, aswell as the de-aggregated forms of temperature and precipitation metrics. While potential interaction between temperature and precipitation variables are likely to exist, these were not included in the saturated model, as just including a reduced list of variables with no interactions, already brought my computer to the edges of its computational limits. Overall, variables where selected based on a priori assumptions on what I thought would be will be more relevant to include, while also factoring in the outputs of my previous models and exploratory analysis.

m_saturated <-
  glm( ELA ~ debris_cov  + morph_type + orientation + length  + P_snow + P_year + P_not_monsoon +                    T_mean_mea.yr + T_mean_monsoon +   T_mean_not_monsoon + Slope_min + Elev_min + Elev_max + Elev_mean, # + T_MIN_mean_monsoon + T_MIN_mean_not_monsoon + T_MAX_mean_monsoon  T_MAX_mean_not_monsoon ; taken out due to computational constraints. Additionally, to get the model to run, area, max/mean slope, t_variability and monsoon precipitation also needed to be removed. I chose these variables, as in previous models and in exploratory analysis they tended have low impacts on ELA.
  data = glacier_scaled,
  family = gaussian) 

summary(m_saturated)
## 
## Call:
## glm(formula = ELA ~ debris_cov + morph_type + orientation + length + 
##     P_snow + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Elev_min + Elev_max + Elev_mean, 
##     family = gaussian, data = glacier_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -319.00   -22.28    -0.55    21.49   273.08  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)         5.087e+03  8.798e+00 578.204  < 2e-16 ***
## debris_cov          1.875e+01  7.928e+00   2.365 0.018255 *  
## morph_type         -2.507e-01  2.733e+00  -0.092 0.926953    
## orientationN        1.495e+01  5.812e+00   2.572 0.010297 *  
## orientationNE       1.537e+01  6.553e+00   2.345 0.019255 *  
## orientationNW       2.370e+01  7.226e+00   3.280 0.001081 ** 
## orientationS        2.711e+01  6.794e+00   3.991 7.17e-05 ***
## orientationSE       6.481e+00  8.088e+00   0.801 0.423155    
## orientationSW       2.608e+01  7.460e+00   3.496 0.000498 ***
## orientationW        2.775e+01  6.691e+00   4.147 3.72e-05 ***
## length             -8.992e+00  3.391e+00  -2.652 0.008158 ** 
## P_snow              6.372e+00  3.453e+00   1.845 0.065359 .  
## P_year              1.664e+01  6.409e+00   2.597 0.009567 ** 
## P_not_monsoon      -3.239e+01  7.727e+00  -4.192 3.06e-05 ***
## T_mean_mea.yr      -4.231e+05  1.411e+07  -0.030 0.976093    
## T_mean_monsoon      1.597e+05  5.329e+06   0.030 0.976091    
## T_mean_not_monsoon  2.646e+05  8.830e+06   0.030 0.976097    
## Slope_min           5.444e+00  1.985e+00   2.742 0.006237 ** 
## Elev_min            1.803e+02  5.926e+00  30.421  < 2e-16 ***
## Elev_max            8.835e+01  4.591e+00  19.244  < 2e-16 ***
## Elev_mean           6.351e+00  7.194e+00   0.883 0.377567    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2401.556)
## 
##     Null deviance: 45691145  on 848  degrees of freedom
## Residual deviance:  1988489  on 828  degrees of freedom
## AIC: 9040.6
## 
## Number of Fisher Scoring iterations: 2
options(na.action = "na.fail") # Required for dredge to run
# Run all possible model permutations and tank according to AIC values
m_dredge <- dredge(m_saturated, beta = F, evaluate = T, rank = AICc)
## Fixed term is "(Intercept)"
options(na.action = "na.omit") # set back to default
head(m_dredge)
## Global model call: glm(formula = ELA ~ debris_cov + morph_type + orientation + length + 
##     P_snow + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Elev_min + Elev_max + Elev_mean, 
##     family = gaussian, data = glacier_scaled)
## ---
## Model selection table 
##       (Int) dbr_cov Elv_max Elv_men Elv_min    lng orn P_not_mns P_snw P_yer
## 10204  5086   19.26   90.23           183.9 -9.266   +    -30.19 5.733 15.09
## 9948   5086   19.37   91.32           184.9 -9.292   +    -24.51       15.51
## 8156   5086   19.04   90.97           183.6 -9.328   +    -32.47 6.361 16.73
## 14300  5086   19.04   90.97           183.6 -9.328   +    -32.47 6.361 16.73
## 12252  5086   19.04   90.97           183.6 -9.328   +    -32.47 6.361 16.73
## 10208  5086   19.03   88.16   4.689   181.4 -9.070   +    -29.66 5.595 14.72
##       Slp_min T_men_mea.yr T_men_mns T_men_not_mns df    logLik   AICc delta
## 10204   5.427                               -28.51 18 -4498.965 9034.8  0.00
## 9948    5.395                               -30.98 17 -4500.478 9035.7  0.94
## 8156    5.498       -60.91     33.41               19 -4498.702 9036.3  1.57
## 14300   5.498                  10.41        -38.10 19 -4498.702 9036.3  1.57
## 12252   5.498        27.58                  -55.36 19 -4498.702 9036.3  1.57
## 10208   5.348                               -26.91 19 -4498.733 9036.4  1.63
##       weight
## 10204  0.291
## 9948   0.182
## 8156   0.133
## 14300  0.133
## 12252  0.133
## 10208  0.129
## Models ranked by AICc(x)
nrow(m_dredge) # 16384 models in total
## [1] 16384
top_model <- get.models(m_dredge, subset = 1)[[1]]
top_model
## 
## Call:  glm(formula = ELA ~ debris_cov + Elev_max + Elev_min + length + 
##     orientation + P_not_monsoon + P_snow + P_year + Slope_min + 
##     T_mean_not_monsoon + 1, family = gaussian, data = glacier_scaled)
## 
## Coefficients:
##        (Intercept)          debris_cov            Elev_max            Elev_min  
##           5086.266              19.258              90.232             183.873  
##             length        orientationN       orientationNE       orientationNW  
##             -9.266              14.967              15.421              23.393  
##       orientationS       orientationSE       orientationSW        orientationW  
##             27.646               7.058              26.234              27.895  
##      P_not_monsoon              P_snow              P_year           Slope_min  
##            -30.188               5.733              15.093               5.427  
## T_mean_not_monsoon  
##            -28.514  
## 
## Degrees of Freedom: 848 Total (i.e. Null);  832 Residual
## Null Deviance:       45690000 
## Residual Deviance: 1992000   AIC: 9034
# Summarize top model
summary(top_model)
## 
## Call:
## glm(formula = ELA ~ debris_cov + Elev_max + Elev_min + length + 
##     orientation + P_not_monsoon + P_snow + P_year + Slope_min + 
##     T_mean_not_monsoon + 1, family = gaussian, data = glacier_scaled)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -319.02   -22.06    -0.35    21.54   274.51  
## 
## Coefficients:
##                    Estimate Std. Error  t value Pr(>|t|)    
## (Intercept)        5086.266      4.483 1134.445  < 2e-16 ***
## debris_cov           19.258      7.854    2.452 0.014408 *  
## Elev_max             90.232      3.345   26.971  < 2e-16 ***
## Elev_min            183.873      4.494   40.917  < 2e-16 ***
## length               -9.266      3.227   -2.871 0.004192 ** 
## orientationN         14.967      5.797    2.582 0.009994 ** 
## orientationNE        15.421      6.537    2.359 0.018550 *  
## orientationNW        23.393      7.206    3.246 0.001216 ** 
## orientationS         27.646      6.726    4.110 4.34e-05 ***
## orientationSE         7.058      8.052    0.877 0.381004    
## orientationSW        26.234      7.412    3.539 0.000424 ***
## orientationW         27.895      6.616    4.216 2.76e-05 ***
## P_not_monsoon       -30.188      6.948   -4.345 1.57e-05 ***
## P_snow                5.733      3.327    1.723 0.085203 .  
## P_year               15.093      5.881    2.566 0.010454 *  
## Slope_min             5.427      1.950    2.783 0.005505 ** 
## T_mean_not_monsoon  -28.514      4.442   -6.419 2.31e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 2393.755)
## 
##     Null deviance: 45691145  on 848  degrees of freedom
## Residual deviance:  1991604  on 832  degrees of freedom
## AIC: 9033.9
## 
## Number of Fisher Scoring iterations: 2

The most important fixed effects where elevation variables and and the various orientations. A more detailed description and visualization can be found in the dot and whisker plot at the end of this section.

# psudo R^2
r.squaredGLMM(top_model) # 95.74%
##            R2m       R2c
## [1,] 0.9556106 0.9556106
plot(top_model)

The residual vs fitted plot shows a mostly flat and even distribution of points, with greater diviances seen at the lower spectrum of points, likely due to the negative skew described above

When checking the QQ-plot it is seen that there is a left tail at lower bonds and a right tail at the upper bonds. Following an ad hoc transformation of the dependent variable ELA, this observed trend only showed a very marginal improvement, so I decided to keep ELA in its un-transformed state.

# Variable effect size visualization
ggcoefstats(x = top_model, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F)

The automated model section based on AIC ranking of 16384 models, indicated that min and max elevation have are the two most important variables to predict ELA (with effect sizes 183.87 and 90.23 respectively & with small std.errors of 4.49 and 3.34), with large effect sizes relative to other variables. The other next most important variables where the various orientation directions, as well as precipitation and temperature in non monsoon periods. A notable orison is mean yearly temperature, which in the glmer model had the highest effect size. This could be attributed to high co-linearity between mean temperature and temperature in non monsoon periods, but with temperature in non monsoon periods have a large effect size and therefor being prioritized in the selection.

Automated variable selection dELA

m_saturated_d <-
  glm( dELA ~ debris_cov  + morph_type + orientation + length + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon +   T_mean_not_monsoon + Slope_min + Slope_mean + Elev_min + Elev_max + Elev_mean, # + T_MIN_mean_monsoon + T_MIN_mean_not_monsoon + T_MAX_mean_monsoon  T_MAX_mean_not_monsoon ; taken out due to computational constraints. Additionally, to get the model to run, area, max slope, t_variability snow, and monsoon precipitation also needed to be removed. I chose these variables, as in previous models and in exploratory analysis they tended have low impacts on ELA.
  data = glacier_scaled,
  family = poisson) 

summary(m_saturated_d)
## 
## Call:
## glm(formula = dELA ~ debris_cov + morph_type + orientation + 
##     length + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Slope_mean + Elev_min + 
##     Elev_max + Elev_mean, family = poisson, data = glacier_scaled)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -13.0002   -3.8463   -0.6707    2.7638   23.1736  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         4.606e+00  1.829e-02 251.793  < 2e-16 ***
## debris_cov         -2.407e-01  1.596e-02 -15.084  < 2e-16 ***
## morph_type         -2.529e-02  5.659e-03  -4.469 7.84e-06 ***
## orientationN       -3.527e-02  1.248e-02  -2.825  0.00473 ** 
## orientationNE      -1.955e-02  1.404e-02  -1.392  0.16378    
## orientationNW      -1.892e-02  1.560e-02  -1.213  0.22519    
## orientationS        2.311e-01  1.363e-02  16.960  < 2e-16 ***
## orientationSE       5.327e-02  1.687e-02   3.158  0.00159 ** 
## orientationSW       2.150e-01  1.482e-02  14.504  < 2e-16 ***
## orientationW        1.781e-01  1.367e-02  13.029  < 2e-16 ***
## length              5.331e-02  6.913e-03   7.712 1.24e-14 ***
## P_year              3.890e-02  1.308e-02   2.975  0.00293 ** 
## P_not_monsoon      -5.807e-02  1.363e-02  -4.259 2.05e-05 ***
## T_mean_mea.yr       7.064e+04  2.970e+04   2.379  0.01737 *  
## T_mean_monsoon     -2.667e+04  1.121e+04  -2.379  0.01737 *  
## T_mean_not_monsoon -4.419e+04  1.858e+04  -2.379  0.01737 *  
## Slope_min          -3.609e-02  4.563e-03  -7.909 2.59e-15 ***
## Slope_mean          1.237e-01  5.366e-03  23.052  < 2e-16 ***
## Elev_min            2.705e-01  1.196e-02  22.623  < 2e-16 ***
## Elev_max            1.673e-01  9.120e-03  18.340  < 2e-16 ***
## Elev_mean          -1.754e-01  1.372e-02 -12.788  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26056  on 848  degrees of freedom
## Residual deviance: 20831  on 828  degrees of freedom
## AIC: 26205
## 
## Number of Fisher Scoring iterations: 4
options(na.action = "na.fail") # Required for dredge to run
# Run all possible model permutations and tank according to AIC values
model_dredge_d <- dredge(m_saturated_d, beta = F, evaluate = T, rank = AICc)
## Fixed term is "(Intercept)"
options(na.action = "na.omit") # set back to default
head(model_dredge_d)
## Global model call: glm(formula = dELA ~ debris_cov + morph_type + orientation + 
##     length + P_year + P_not_monsoon + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + Slope_min + Slope_mean + Elev_min + 
##     Elev_max + Elev_mean, family = poisson, data = glacier_scaled)
## ---
## Model selection table 
##       (Int) dbr_cov Elv_max Elv_men Elv_min     lng  mrp_typ orn P_not_mns
## 16384 4.606 -0.2407  0.1673 -0.1754  0.2705 0.05331 -0.02529   +  -0.05807
## 8192  4.607 -0.2391  0.1667 -0.1750  0.2699 0.05355 -0.02607   +  -0.05847
## 14336 4.607 -0.2391  0.1667 -0.1750  0.2699 0.05355 -0.02607   +  -0.05847
## 12288 4.607 -0.2391  0.1667 -0.1750  0.2699 0.05355 -0.02607   +  -0.05847
## 16128 4.615 -0.2421  0.1661 -0.1765  0.2732 0.05669 -0.02816   +  -0.01951
## 7936  4.616 -0.2405  0.1656 -0.1761  0.2727 0.05693 -0.02892   +  -0.02006
##         P_yer Slp_men  Slp_min T_men_mea.yr  T_men_mns T_men_not_mns df
## 16384 0.03890  0.1237 -0.03609   70640.0000 -2.667e+04    -4.419e+04 21
## 8192  0.03875  0.1236 -0.03582       0.6414 -4.445e-01               20
## 14336 0.03875  0.1236 -0.03582              -2.023e-01     4.012e-01 20
## 12288 0.03875  0.1236 -0.03582      -0.5360                7.365e-01 20
## 16128          0.1254 -0.03725   70240.0000 -2.652e+04    -4.394e+04 20
## 7936           0.1253 -0.03698       0.6898 -4.971e-01               19
##          logLik    AICc delta weight
## 16384 -13081.53 26206.2  0.00  0.646
## 8192  -13084.36 26209.7  3.56  0.109
## 14336 -13084.36 26209.7  3.56  0.109
## 12288 -13084.36 26209.7  3.56  0.109
## 16128 -13085.96 26212.9  6.75  0.022
## 7936  -13088.75 26216.4 10.24  0.004
## Models ranked by AICc(x)
nrow(model_dredge_d) # 16384 models in total
## [1] 16384
top_model_d <- get.models(model_dredge_d, subset = 1)[[1]]
top_model_d
## 
## Call:  glm(formula = dELA ~ debris_cov + Elev_max + Elev_mean + Elev_min + 
##     length + morph_type + orientation + P_not_monsoon + P_year + 
##     Slope_mean + Slope_min + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + 1, family = poisson, data = glacier_scaled)
## 
## Coefficients:
##        (Intercept)          debris_cov            Elev_max           Elev_mean  
##          4.606e+00          -2.407e-01           1.673e-01          -1.754e-01  
##           Elev_min              length          morph_type        orientationN  
##          2.705e-01           5.331e-02          -2.529e-02          -3.527e-02  
##      orientationNE       orientationNW        orientationS       orientationSE  
##         -1.955e-02          -1.892e-02           2.311e-01           5.327e-02  
##      orientationSW        orientationW       P_not_monsoon              P_year  
##          2.150e-01           1.781e-01          -5.807e-02           3.890e-02  
##         Slope_mean           Slope_min       T_mean_mea.yr      T_mean_monsoon  
##          1.237e-01          -3.609e-02           7.064e+04          -2.667e+04  
## T_mean_not_monsoon  
##         -4.419e+04  
## 
## Degrees of Freedom: 848 Total (i.e. Null);  828 Residual
## Null Deviance:       26060 
## Residual Deviance: 20830     AIC: 26210
# Summarize top model
summary(top_model_d)
## 
## Call:
## glm(formula = dELA ~ debris_cov + Elev_max + Elev_mean + Elev_min + 
##     length + morph_type + orientation + P_not_monsoon + P_year + 
##     Slope_mean + Slope_min + T_mean_mea.yr + T_mean_monsoon + 
##     T_mean_not_monsoon + 1, family = poisson, data = glacier_scaled)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -13.0002   -3.8463   -0.6707    2.7638   23.1736  
## 
## Coefficients:
##                      Estimate Std. Error z value Pr(>|z|)    
## (Intercept)         4.606e+00  1.829e-02 251.793  < 2e-16 ***
## debris_cov         -2.407e-01  1.596e-02 -15.084  < 2e-16 ***
## Elev_max            1.673e-01  9.120e-03  18.340  < 2e-16 ***
## Elev_mean          -1.754e-01  1.372e-02 -12.788  < 2e-16 ***
## Elev_min            2.705e-01  1.196e-02  22.623  < 2e-16 ***
## length              5.331e-02  6.913e-03   7.712 1.24e-14 ***
## morph_type         -2.529e-02  5.659e-03  -4.469 7.84e-06 ***
## orientationN       -3.527e-02  1.248e-02  -2.825  0.00473 ** 
## orientationNE      -1.955e-02  1.404e-02  -1.392  0.16378    
## orientationNW      -1.892e-02  1.560e-02  -1.213  0.22519    
## orientationS        2.311e-01  1.363e-02  16.960  < 2e-16 ***
## orientationSE       5.327e-02  1.687e-02   3.158  0.00159 ** 
## orientationSW       2.150e-01  1.482e-02  14.504  < 2e-16 ***
## orientationW        1.781e-01  1.367e-02  13.029  < 2e-16 ***
## P_not_monsoon      -5.807e-02  1.363e-02  -4.259 2.05e-05 ***
## P_year              3.890e-02  1.308e-02   2.975  0.00293 ** 
## Slope_mean          1.237e-01  5.366e-03  23.052  < 2e-16 ***
## Slope_min          -3.609e-02  4.563e-03  -7.909 2.59e-15 ***
## T_mean_mea.yr       7.064e+04  2.970e+04   2.379  0.01737 *  
## T_mean_monsoon     -2.667e+04  1.121e+04  -2.379  0.01737 *  
## T_mean_not_monsoon -4.419e+04  1.858e+04  -2.379  0.01737 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 26056  on 848  degrees of freedom
## Residual deviance: 20831  on 828  degrees of freedom
## AIC: 26205
## 
## Number of Fisher Scoring iterations: 4

The most important fixed effects where mean annual temperature and mean temperature during monsoon phases. A more detailed description and visualization can be found in the dot and whisker plot at the end of this section.

# psudo R^2
r.squaredGLMM(top_model_d)  # 83.01%
## Warning: The null model is correct only if all variables used by the original
## model remain unchanged.
##                 R2m       R2c
## delta     0.8482790 0.8482790
## lognormal 0.8489160 0.8489160
## trigamma  0.8476367 0.8476367
plot(top_model_d)

The residual vs fitted plot shows a mostly flat and even distribution of points, with greater deviances seen at the lower spectrum of points, likely due to the negative skew described above

When checking the QQ-plot it is seen that there is a left tail at lower bonds and a right tail at the upper bonds. Following an ad hoc transformation of the dependent variable ELA, this observed trend only showed a very marginal improvement, so I decided to keep ELA in its un-transformed state.

# Variable effect size visualization
ggcoefstats(x = top_model_d, conf.int = TRUE,
  conf.level = 0.95, conf.method = "HPDinterval", exclude.intercept = T, stats.labels =F)

In the dot-whisker visual representation of the effect sizes and standard errors of our the variables we can see that mean elevation and mean annual temperature are the two most influential variables at predicting dELA (although the are also accompanied by large standard errors). These are then followed by precipitation the various available elevation variables, as well as the debris cover, which was already found to be relevant in predicting dELA in the hierarchical GLM and baysian models.

Compairisons and conclusions

(BIC_ELA <- BIC(m_lme4, m_bays, m_pca, top_model) %>% 
  arrange(BIC))
##           df      BIC
## top_model 18 9119.323
## m_lme4    15 9657.006
## m_bays    15 9723.412
## m_pca      7 9960.545
(BIC_dELA <- BIC(m_lme4_d, m_bays_d, m_pca_d, top_model_d) %>% 
  arrange(BIC))
##             df       BIC
## m_bays_d    15  6308.751
## top_model_d 21 26304.696
## m_lme4_d    14 26733.776
## m_pca_d      6 28504.261
# initial AIC ranking that I abandoned in favour of using BIC
#AICc_df <- AICc(m_lme4, m_lme4_d, m_bays, m_bays_d, m_pca, m_pca_d, top_model, top_model_d) %>% 
#  arrange(AICc)
#AICc_df

For the between model comparison, I decided to use the Bayesian Information Criterion (BIC), as opposed to the Akaike`s information criterion (c). Using the BIC comes with a number of advantages such as assessing parsimony of different model types and model structures more consistently, and being more tolerant of models with slight under-fitting (likely to be the case in some of my models, as residual deviation was sometimes large at low and high end of the distribution). Furthermore BIC is more selective for the total number of variables included in the model and has a stronger negative weight associated to number of parameters. Applying greater penalization to models with many variables seems like a reasonable approach for this analysis, as all models had large sets of included variables, increasing the risk of over-fitting. A final argument for using BIC is that this has the potential to reduce the arbitrary methodological advantage of the automated variable selection methods, As the automated band selection methods optimize for AIC. Using BIC as the final assessment criterion allows for a slightly fairer comparison between the models. As there is a slight differentiation between AIC and BIC, this means that the automated selection models are not fully optimized for the assessment criterion.

Based on a compared BIC ranking between the models constructed under my four modeling approaches, it is seen that automated variable selection performed best for predicting both the current ELA. While I choice BIC as the final assessment criterion, it still unsurprising that the automated band selection methods performed the best, as they where optimized to have low parsimony. Despite the automated variable selection method practically being the “best” model, in terms of my personal confidence in the found results and overall reproduceability, I would favor the Bayesian model. The Bayesian models yielded an acceptably low BIC values for both ELA, and comes with the conceptual advantage of having its variables selected based on a priori knowledge/guesses of what makes mechanistic sense to include. The reason that the Bayesian model for dELA is likely to have a vastly lower BIC is that it may handle being defined as a Poisson distribution model differently than the frequentest approaches. It is possible that my initial choice of using a Poisson as opposed to a inverse Gaussian was sub-optimal, therefor resulting in the inflated BIC values. While unlikely based on my understanding of how BIC is calculated, an alternative explanation could be that when using a Poisson distribution the underlying math for calculating BIC, and that the inflated BIC values are a calculation artifact and not a result of using the wrong distribution for the dELA models.

Regardless of what modeling approach is preferred, synthesizing the results of all my models helps provide more robust answers to the main research questions.

1. Which are the most important topographical and climatological drivers of glacier equilibrium line altitude (ELA)?

Synthesizing the results from all my modeling approaches, (while favoring my a priori defined hierarchical models), I have identified mean annual temperature and mean elevation and as the two most important variables for predicting ELA. Despite the automated selection method choosing temperature in non monsoon phases over mean annual temperature, I believe that is likely an artifact of high degree of co-linearity between the variables and the automated selection method being highly sensitive to marginal deviation in effect size. Overall though, both Philosophical modeling approaches converge on the fact that temperature values averaged across longer timescales are critical to predicting ELA. Mean precipitation in non monsoon periods and temperature variability are generally the next two most important variables and where consistently important predictors across all modeling approaches. Overall, when considering the results of my models I conclude that mean annual temperature and mean elevation and as the two most important variables for predicting ELA, while mean precipitation in non monsoon periods and temperature variability being the two strongest key secondary variables.

2. Can we explain ELA changes since the Little Ice Age with topographical and climatological variables?

Based on the synthesized results of all my models I believe that ELA changes since the Little Ice Age can be reasonably predicted using climatological and topographical variables. As when attempting to predict ELA, all models converged on mean annual temperature being a critical variable when predicting dELA. Furthermore both the hierarchical and automated section models identified debris cover as a variable with large influence on dELA. Additional variables that both modeling approaches identified as being particularly useful for predicting dELA where precipitation in non monsoon periods, and the various available elevation metrics. Overall, when considering the results of my models I conclude that mean annual temperature and debris cover and as the two most important variables for predicting dELA, while mean precipitation in non monsoon periods and catchment elevation being the two strongest key secondary variables.

Remarks on ELA and dELA shared characteristics

Both glacier equilibrium line altitude and changes in ELA since the Little Ice Age can be explained with with topographical climatological variables. While the importance and strength of topographical and climatological variables for predicting ELA and dELA differ, it is notable that key variables are particularly important for predicting both. I found that Mean annual temperature, mean elevation, and precipitation in non monsoon periods (correlated with snow fall) to universally being strong predictors of both ELA and dELA, across modeling approaches. This makes mechanistic/conceptual sense and as i) snow fall amount, ii) temperature, and iii) elevation are key variables that respectively describe the core physical glaciation process of i) glacier mass balance accumulation, ii) glacier mass balance decline, iii) the core environmental conditions that define glacier stability. While I don not want to overinterprete the significance of these variables, without a expert background knowledge of the nuanced geophysical processes of glacier dynamics, the apparent mechanistic logic of these variables being identified as being key predictors, provides some additional confidence in the validity of the modeling decisions and approaches made during this analysis.